Experience Replay

## Summary

When the agent interacts with the environment, the sequence of experience tuples can be highly correlated. The naive Q-learning algorithm that learns from each of these experience tuples in sequential order runs the risk of getting swayed by the effects of this correlation. By instead keeping track of a replay buffer and using experience replay to sample from the buffer at random, we can prevent action values from oscillating or diverging catastrophically.

The replay buffer contains a collection of experience tuples ( S , A , R , S' ). The tuples are gradually added to the buffer as we are interacting with the environment.

The act of sampling a small batch of tuples from the replay buffer in order to learn is known as experience replay . In addition to breaking harmful correlations, experience replay allows us to learn more from individual tuples multiple times, recall rare occurrences, and in general make better use of our experience.

## Quiz

Which of the following are true? Select all that apply.

Experience replay is based on the idea that we can learn better, if we do multiple passes over the same experience.

Experience replay causes harmful correlations and can cause lead to action-value estimates that fail to converge while training.

Experience replay is used to generate uncorrelated experience data for online training of deep RL agents.

Once an experience tuple is randomly sampled from the replay buffer, the agent learns from it, and then it is discarded from the buffer.

SOLUTION:

Experience replay is based on the idea that we can learn better, if we do multiple passes over the same experience.
Experience replay is used to generate uncorrelated experience data for online training of deep RL agents.